Free
SERVICE DESCRIPTION | 🔬 Internship/Research Topic Description Title: Multitask Computer Vision and Image Processing for Ecological Intelligence, Visual Restoration, Species Recognition, Multilingual Text Detection, and Facial Activity Understanding --- General Overview: This…
- KST Cisitu (Samaun Samadikun)
🔬 Internship/Research Topic Description
Title:
Multitask Computer Vision and Image Processing for Ecological Intelligence, Visual Restoration, Species Recognition, Multilingual Text Detection, and Facial Activity Understanding
---
General Overview:
This research and internship topic focuses on the application of modern computer vision and image processing techniques across multiple real-world domains. The project spans biodiversity monitoring, shadow removal in natural images, automatic fish species classification (specifically tuna), multilingual text detection, and the recognition of public figures along with their inferred activities from static images.
This integrated approach allows students to explore multitask deep learning, domain-specific dataset curation, and the deployment of AI systems in practical environments such as conservation, cultural documentation, fisheries, and media analysis.
---
Subtopics and Focus Areas:
1. Biodiversity Detection and Monitoring
Objective: Detect and classify flora and fauna from terrestrial, aerial, or underwater imagery.
Applications: Environmental conservation, national park monitoring, wildlife detection.
Techniques: Object detection (YOLOv8, EfficientDet), semantic segmentation.
2. Shadow Removal in Natural Images
Objective: Improve visual clarity and data consistency by eliminating shadows from images.
Applications: Preprocessing for object detection, restoration in historical or outdoor images.
Techniques: Deep CNNs, GANs, image decomposition.
3. Tuna Species Classification
Objective: Recognize and classify tuna species from fisheries-related photos.
Applications: Smart fisheries, automated catch logging, sustainable seafood tracking.
Techniques: Transfer learning (ResNet, MobileNet), image classification pipelines.
4. Multilingual Text Detection and Recognition
Objective: Extract and recognize text from natural scenes containing various global scripts.
Applications: Cultural preservation, automated translation, smart signage, tourism.
Techniques: OCR (Tesseract, EasyOCR, TrOCR), multilingual datasets (Latin, Arabic, Chinese, Cyrillic, etc.)
5. Face Detection and Public Figure Recognition
Objective: Detect faces and recognize public figures from photographs and video frames.
Applications: Archival image labeling, media automation, digital heritage.
Techniques: FaceNet, ArcFace, DeepFace, facial embeddings, Dlib.
6. Visual Activity Recognition from Faces
Objective: Identify actions or expressions (e.g., speaking, reading, smiling) from facial landmarks and image context.
Applications: Political speech analysis, behavioral inference, multimedia tagging.
Techniques: CNN + LSTM, OpenPose, facial keypoint detection.
Suitable for:
S1 (Bachelor): Single-domain implementation, small dataset, final report, and prototype system.    Â
 S2 (Master): Deep learning integration, custom datasets, Thesis, system benchmarking, and scientific paper.
S3 (Doctoral) : Multitask learning, explainable AI, multimodal vision, Dissertation, published research, and open-source framework.Â
Technical Stack:
Languages & Frameworks: Python, OpenCV, PyTorch, TensorFlow, Keras
Tools: Roboflow, Label Studio, Google Colab, HuggingFace
Pretrained Models: YOLOv8, DeepFace, TrOCR, Swin Transformer
---
Learning Outcomes:
Gain hands-on experience in multitask computer vision workflows
Apply AI to real-world ecological and socio-cultural challenges
Publish research or deploy usable AI tools for environmental and public data applications
Application Requirements:
Title:
Multitask Computer Vision and Image Processing for Ecological Intelligence, Visual Restoration, Species Recognition, Multilingual Text Detection, and Facial Activity Understanding
---
General Overview:
This research and internship topic focuses on the application of modern computer vision and image processing techniques across multiple real-world domains. The project spans biodiversity monitoring, shadow removal in natural images, automatic fish species classification (specifically tuna), multilingual text detection, and the recognition of public figures along with their inferred activities from static images.
This integrated approach allows students to explore multitask deep learning, domain-specific dataset curation, and the deployment of AI systems in practical environments such as conservation, cultural documentation, fisheries, and media analysis.
---
Subtopics and Focus Areas:
1. Biodiversity Detection and Monitoring
Objective: Detect and classify flora and fauna from terrestrial, aerial, or underwater imagery.
Applications: Environmental conservation, national park monitoring, wildlife detection.
Techniques: Object detection (YOLOv8, EfficientDet), semantic segmentation.
2. Shadow Removal in Natural Images
Objective: Improve visual clarity and data consistency by eliminating shadows from images.
Applications: Preprocessing for object detection, restoration in historical or outdoor images.
Techniques: Deep CNNs, GANs, image decomposition.
3. Tuna Species Classification
Objective: Recognize and classify tuna species from fisheries-related photos.
Applications: Smart fisheries, automated catch logging, sustainable seafood tracking.
Techniques: Transfer learning (ResNet, MobileNet), image classification pipelines.
4. Multilingual Text Detection and Recognition
Objective: Extract and recognize text from natural scenes containing various global scripts.
Applications: Cultural preservation, automated translation, smart signage, tourism.
Techniques: OCR (Tesseract, EasyOCR, TrOCR), multilingual datasets (Latin, Arabic, Chinese, Cyrillic, etc.)
5. Face Detection and Public Figure Recognition
Objective: Detect faces and recognize public figures from photographs and video frames.
Applications: Archival image labeling, media automation, digital heritage.
Techniques: FaceNet, ArcFace, DeepFace, facial embeddings, Dlib.
6. Visual Activity Recognition from Faces
Objective: Identify actions or expressions (e.g., speaking, reading, smiling) from facial landmarks and image context.
Applications: Political speech analysis, behavioral inference, multimedia tagging.
Techniques: CNN + LSTM, OpenPose, facial keypoint detection.
Suitable for:
S1 (Bachelor): Single-domain implementation, small dataset, final report, and prototype system.    Â
 S2 (Master): Deep learning integration, custom datasets, Thesis, system benchmarking, and scientific paper.
S3 (Doctoral) : Multitask learning, explainable AI, multimodal vision, Dissertation, published research, and open-source framework.Â
Technical Stack:
Languages & Frameworks: Python, OpenCV, PyTorch, TensorFlow, Keras
Tools: Roboflow, Label Studio, Google Colab, HuggingFace
Pretrained Models: YOLOv8, DeepFace, TrOCR, Swin Transformer
---
Learning Outcomes:
Gain hands-on experience in multitask computer vision workflows
Apply AI to real-world ecological and socio-cultural challenges
Publish research or deploy usable AI tools for environmental and public data applications
Application Requirements:
- Photo
- File Data Transkrip
- Cover Letter
- Curriculum Vitae
- Statement Letter
- Other Supporting File
- Proposal